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CNJ ' The consultative papers for the Basel II Accord require rating systems to provide a 

ranking of obligors in the sense that the rating categories indicate the creditworthiness in 
terms of default probabilities. As a consequence, the default probabilities ought to present 
' a monotonous function of the ordered rating categories. This requirement appears quite in- 

tuitive. In this paper, however, we show that the intuition can be founded on mathematical 

^ | facts. We prove that, in the closely related context of a continuous score function, monotonic- 

ity of the conditional default probabilities is equivalent to optimality of the corresponding 
decision rules in the test-theoretic sense. As a consequence, the optimality can be checked 
by inspection of the ordinal dominance graph (also called Receiver Operating Characteristic 
curve) of the score function: it obtains if and only if the curve is concave. We conclude the 
paper by exploring the connection between the area under the ordinal dominance graph and 
the so-called Information Value which is used by some vendors of scoring systems. 

Keywords: Conditional default probability, score function, most powerful test, Information 
Value, Accuracy Ratio. 



1 Introduction 

f?. 

In its new attempt - the so-called Basel II Accord - to provide quantitative rules for the capital 
banks are charged with for their credit risks, the Basel Committee requires banks to determine 
default probabilities for all obligors in their credit portfolios. These default probabilities can 
be derived from internal rating systems ( Basel Committee, 200l| ). As a consequence, there is 



a growing need to develop internal rating systems in order to meet the requirements by the 
Basel II Accord. 

However, the Basel II Accord does not only allow internal ratings but also gives rules for prop- 
erties of the rating systems which will be checked by and then by the supervisory authorities. 
Hence, the rating systems used by the banks have to meet certain quality standards. These 
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standards include the actual state of the system as well as the process of its development. As a 
consequence, a lot of work has been done by several researchers in order to establish a common 
standard based on reasonable economic and statistical assumptions. 

Monotonicity of the rating system in the sense that better ratings should correspond to lower 



default probabilities is one of the most important requirements (see e.g. Fritz and Popken, 2002 ; 



Krahnen and Weber, 200"l| ) . In case that the rating system is based on a score function Fritz and 



Popken (2002| ) even demand monotonicity at score level. In the paper at hand, we show that 



this requirement can be based on a decision-theoretic foundation. We prove that monotonicity 
of the conditional default probabilities is equivalent to optimality of the corresponding decision 
rules in the test-theoretic sense. 

This paper is organized as follows. In Section || we present the mathematical framework and 
some basic statistical facts on score functions. Section || gives in Proposition 3.3 the main 



result. In Section || we discuss the connection between two important summary statistics for the 
discriminatory power of score functions: the Information Value (IV) and the Accuracy Ratio 
(AR). 



2 Assumptions and basic facts 

In the sequel we describe the result of the scoring process by a real random variable (or statistic) 
S on a probability space (O, J-,~P). A second statistic T (for type) with values in the set {D, N} 
(D for default and N for non-default) indicates whether the firm which has been scored will 
be insolvent or solvent by a previously fixed time horizon. As, however, the value of T can be 
observed only with some delay the financial institution faces the problem to infer its value from 
the known value of S. 

In order to describe formally the problem we fix some assumptions and notations: 

• Write short D for the event {T = D} and N for {T = N}. 

• Denote the overall default probability by p, i.e. p = ~P[D] € (0, 1). 

• Assume that S has a conditional density f(s\t) given T, i.e. 

P[5 G A\T = t] = [ f(s\t)ds 

J A 

for both t = D and t = N and any Borel subset A of the real line. For the sake of brevity 
we write 

f D (s) = f(s\D) and f N (s) = f(s\N). 

• Given the conditional densities fjj and /at of S for the two possible values of T, we denote 
the conditional distribution functions of S given the values of T by Fjj and Fn respectively, 
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i.e. 



F D {s) =P[S <s\D] = [ f D (x)dx, 

J — oo 

F N {s) =P[S <s\N] = [ f N {x)dx. 

J — oo 



(2.1) 



Assumption 2.1 (Smoothness of model) The densities fo and fjy are positive continuous 
functions in some open interval let. For any x > the set 

U x = {sel: xf N (s) = f D {s)} 

has Lebesgue measure 0. 



Note that under Assumption 2.1 both Fd and Fjy are continuously differentiable and strictly 
increasing functions. In particular, their inverse functions F^ 1 and F^ 1 respectively exist and 
are uniquely defined. 

Definition 2.2 (Likelihood ratio) We call 

Jn(s) 

the likelihood ratio at score s. 



From Assumption 2.1 follows that the conditional distributions P[LoS E -\D] and P[LoS € • | N] 
of the likelihood ratio applied to the score statistic given the type of the firm under consideration 
are continuous. To see this note that e.g. 

P[LoS = x\D) = P[5 <E U x \D) = [ f D (s)ds = 

Ju x 

for arbitrary x > 0. 

A further conclusion from Assumption [O] is a well-known formula (e.g |Kullback, 1959| , p. 4) for 

the conditional default probability P[D \ S = s] of a firm given that its score equals s, namely 

Since for all s G / we have P[S = s] = 0, this conditional probability has to be understood in 
the non-elementary sense (cf. purrett, 19961 , ch. 4). 



If we assume that insolvent firms in general receive lower scores than solvent firms, a very simple 
procedure for a test of the hypothesis T = N against the alternative T = D is to fix a score 
level sq E I and to reject T = N whenever S < sq. Of course, in case of higher scores for the 
insolvent firms the hypothesis T = N should be rejected whenever S > sq. In the sequel, we will 
restrict ourselves to the consideration of the first case since for most of the results the transfer 
to the second case is obvious. 
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The level sq should be chosen in such a way that a maximal level of the Type I error (i.e. to 
reject T = N although it is true) is guaranteed. For a given level u G (0, 1) this can be achieved 
by setting 

so = F-\u). (2.4) 

The classic way to compare the discriminatory power of different decision statistics is to fix the 
Type I error at some level and to look at the size of the Type II error (i.e. T = N is not rejected 
although it is false). In the context of the test procedure specified by ( p33| ) the size of the Type II 
error is P[S > sq\D] = 1 — Fd(sq). For convenience, in the sequel we will call tests of this kind 
cut-off tests. Of course, it does not matter whether the Type II error is minimized or Fd(so) 
(often called the power of the test) is maximized over all possible statistics S. By convention, 
we consider here the maximization variant. 

The function 5 : (0, 1) -> [0, 1] defined by 

S(u) = S s (u) = F D (F- 1 (u)), u G (0,1), (2.5) 

maps every possible level of the Type I error on the power of the corresponding cut-off test. If for 
two test statistics S\ and 52, we have 5s 1 (u) > 5s 2 (u) for any u then we know that 5i delivers 
uniformly more powerful cut-off tests than £2. Plotting the graphs of Ss 1 and 5s 2 provides a 
convenient way to check this. 

In the literature, the graph of 5 is known as ordinal dominance graph or as receiver operating 
characteristic graph (see e.g. Bamber, 1975| ). There is a well-known connection between the 



function 5 and the likelihood ratio defined by (|2.2|), namely 



= L(F-\u)), u€(0,l). (2.6) 
As already noticed by Bamberg from ( |2.6D follows that 5 is concave in u if and only if the likelihood 



ratio L is non-increasing in s (or, by ( |2.3| ), equivalently the conditional default probability is 
non- increasing in s). Similarly, 5 is convex in u if and only if L and the conditional default 
probability are non-decreasing in s. 

Note that tests for T = N against T = D need not necessarily be of cut-off form. In general, it 
seems reasonable to allow for all tests which are specified by some rejection range ficR. Such 
a test would reject T = N if S £ R. In case of a cut-off test, R is described by 

R=(-oo,s ). (2.7) 

As an example for other sensible forms of the rejection range, consider the case when insolvent 
firms tend to receive scores close to whereas solvent firms achieve negative or positive scores 
with large absolute values. In this case the choice R = (si,s u ) for some numbers s/ < < s u 
appears more appropriate than (|2.7|). As a consequence, comparing this score function with a 
score function which assigns low values to insolvent and high values to solvent firms by means 
of the function 5 from (|2,5|) would give a biased impression. 



How can this problem be revealed? In Section ||, we will show that function 5s is concave if and 
only if the S'-based most powerful tests of T = N against T = D are cut-off. Hence, a non- 
concave 5s (and equivalently non-monotonous conditional default probabilities) would indicate 
that the information which is contained in the statistic S is not optimally exploited. 
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3 Monotonous conditional default probabilities and optimal cut- 
off tests 



For a proper formulation of the connection between conditional default probabilities and optimal 
cut-off tests we need a bit more of mathematical notation. 

Definition 3.1 (Randomized test) Let statistics S and T like in Section^ with values in I 
and {N, D} respectively be given. Then any measurable function (ft : I — > [0, 1] is called S-based 
(randomized) test for T = N against T = D. 

An S-based test (ft is called test at level a € (0, 1) if E[(ft o S \ N] < a. 

An S-based test (ft* is called most powerful test at level a G (0, 1) for T = N against T = D if 
it is a test at level a and for any S-based tests (ft at level a we have 

E[(ft*o S\D] > E[cfto S\D]. (3.1) 

We interpret the value (ft(s), s G /, of a randomized test as the probability of success which 
should be applied in an additional Bernoulli experiment conducted by the user of the test in 
order to decide whether T = N should be rejected. Hence, if e.g. (ft(s) = 0.6, the user would 
throw a coin with success probability 0.6 and would reject T = H in case of success only. If 
(ft(s) = 1, the hypothesis has to be rejected unconditionally. In case (ft(s) = it must not be 
rejected. Tests at level a are just those tests which guarantee a maximal level a of the Type I 
error (i.e. to reject T = N although it is true). Most powerful tests at level a are those tests 
which minimize the Type II error size among all the tests at level a. 

Randomized tests are not widespread in practice since it appears very strange to throw a coin 
in order to decide about the rejection of a hypothesis. However, the common deterministic tests 
are a subclass of the randomized tests (they are just those tests with values in {0, 1} only), and 
the notion of randomized test is quite convenient for the formulation of a complete theory of 
test optimality. 

Our main result will be based on the Neyman-Pearson Fundamental Lemma. We quote it here 
in a form adapted to Assumption 2A in order to avoid some technical difficulties. See Witting] 



1978, ch. 2.7) for more general versions. 



Theorem 3.2 (Neyman-Pearson Fundamental Lemma) 



Fix a Type I error level a E (0, 1). Then, under Assumption 2A_, an S-based test (ft for T = N 

(3.2) 



against T = D is most powerful at level a if and only if for Lebesgue- almost s£l 

f 1, ifL(s) > L a 



0, otherwise, 



where L is the likelihood ratio defined by (2J.) and L a is any constant such that P[LoS > L c 
a. 



From Theorem |3.2| we know that there is a deterministic most powerful test at level a, namely 
the test with rejection range R = L^ 1 ((L a , oo)) . 
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If the likelihood ratio L is non-increasing then the test from (|3.2| ) has cut-off form as described 
in Section [|, i.e. there is a constant s a such that the rejection range is (— oo, s a ). Hence, from 
the observations in Section [2] it is clear that concavity of the ordinal dominance graph implies 
that the most powerful S'-based tests for T = N against T = D have cut-off form. However, this 
concavity is not only a sufficient but even a necessary condition for the most powerful tests to 
be cut-off. 



Proposition 3.3 (Optimality of cut-off tests) 

Under Assumption \2.1[ there is for every Type I error level a G (0, 1) a most powerful S- 
based test with rejection range of the form (— 00, s a ) for some s a £l if and only if the ordinal 
dominance function 5s as defined by ( \2.3^ ) is concave. Similarly, there is for every Type I error 
level a £ (0, 1) a most powerful S-based test with rejection range of the form (s a , 00) for some 
s a 6 R if and only if the ordinal dominance function 5s as defined by j\2.5\ ) is convex. 



In the Appendix, we provide a proof for the statement that existence of most powerful cut-off 
tests as in the first part of Proposition [D^ implies concavity of the ordinal dominance function. 



Note the dependence on the underlying statistic S in Proposition I3JJ. Both the most powerful 
tests as well as the ordinal dominance function 5s are defined in terms of S and its conditional 
distributions. As a consequence, there might be another statistic S* such that there are S*-based 
tests that are more powerful than the corresponding S'-based tests at the same Type I error level. 
Thus, Proposition |3.3j gives a statement on the optimal use of available information when the 
score function has been fixed. The process of finding an appropriate score function is not subject 
of the proposition. 



4 Information value and the area under the ordinal dominance 
graph 



In section ||, we have investigated which conclusions can be drawn from concavity or non- 
concavity of the ordinal dominance graph. In this section, we will compare the area under 
the ordinal dominance graph as a performance measure for score functions with the so-called 
Information Value to be explained below. 



The natural logarithm of the likelihood ratio defined by (2.2) is sometimes called weight of 
evidence (see Good, 1950| ). It is used by some vendors of scoring systems as a means to detect 
failures in exploiting the full information of a statistic. Observe that the usefulness of this method 



is underpinned by Proposition 3.3 



Of course, the weight of evidence is only a local measure of the information content of a statistic. 
Kullback (1959, p. 6) suggested the Information Value (or divergence) as the corresponding 
global measure. It is defined by 

IV S = J \f D (s) - f N (s)) log L(s)ds = E[logLoS\D]-E[logLoS\N}. (4.1) 
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From the first representation in ( |4.l|) follows that IVs is always non-negative and symmetric in 
/at and ff). In order to arrive at a test-theoretic interpretation of the Information Value, observe 
that it can be equivalently written as 

IV S = ^(P[logLoS < x\D] -P[logLo5 < x\N])dx. (4.2) 

Hence IVs 1S just the sum of the signed areas between the graphs of the distribution functions 
of log L o S conditional on T = N and T = D respectively. Define 

S*=logLoS, and Ffr(s) = P[S* < s | N], = P[S* < s \ D], s e /, (4.3) 

and assume that is differentiable and strictly increasing such that (F^)~ 1 exists (and hence 
is differentiable, too). With the substitution u = F^(s), then ( [4.2j ) can be written as 



IV S 



F£((i^)-») - u )^f^(u)du. (4.4) 



-i 



Replacing the derivative j$ — (u) from (4.4) by the constant 1 yields the integral 



l f i 

(FZ((F%)- 1 (u))-u)du= / (5 s .(u)-u)du, (4.5) 
o Jo 



with 5s* defined analogously to 5s in (|2.5|). Here, the right-hand side of (^5) measures the area 
between the ordinal dominance graph of the statistic S* and the diagonal. Twice this area is a 
well-known performance measure for scoring systems - the so-called Accuracy Ratio. The area 
plus 1/2 is just the average power of the 5*-based cut-off tests where the average is computed 
with equal weight 1 for all Type I error levels in (0, 1). Recall, however, that in case of ( f4.5| ) the 
score function S has been replaced by the score function S* . 



By Q4.4D and ( |4.5| ) we have seen that the concepts of information value and Accuracy Ratio are 
quite similar from the computational point of view. Nevertheless, they differ essentially in two 
aspects. First, the information value is calculated for the transformed statistic S*. And second, 
for IVs the difference between the ordinal dominance graph and the diagonal is weighted with 
the derivative of the inverse conditional distribution function of S* given N. 

Another way to express the similarity between the two concepts is to write the Accuracy Ratio 
ARs = 2 f Q (5s(u) — uj du as 

AR S = 2 (E[F D oS\N]- E[F D o S\D]). (4.6) 



Hence, ARs can be generated from IVs essentially by substituting Fjj for logL in ( |4.1[ ). Both 
ARs and IVs can be interpreted as difference of the conditional expectations of a transformation 
of S given T = N and T = D respectively. However, the transformation by Fq is monotonous 
and yields values in a bounded range for ARs whereas by Proposition |3.3| the transformation 
by logL is monotonous if and only there are most powerful cut-off tests. In general, there is no 
finite bound for the value of IVs- 
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A Appendix 



We sketch here the proof of the necessity part of Proposition 3.3, i.e. that concavity of the 
ordinal dominance function 5 is necessary for the existence of a most powerful cut-off test of 
T = N against T = D at any level a £ (0, 1). 



From (|2.6| ) we know that, under Assumption ^lj, concavity of 5 is equivalent to the likelihood 
ratio L being non-increasing. Hence, since L is positive and continuous by assumption, it suffices 
to show that for any r > there is some l r € / such that 

L~ l {{r, oo)) = (-oo,Z r )n/. (A.l) 



Choose an arbitrary r > and let a = P[L o S > r | D). By Theorem 3.2, the test "rejection of 
T = N if L o S > r" is most powerful at level a. However, by assumption, there is an s a G / 
such that P[S < s a ] < a and that the test "rejection of T = N if S < s a " is also most powerful 



at level a. Again by Theorem 3.2, it follows that the functions 



j 1, if L(s) >r j 1, if s < s 

Ti[s) = < and T2[s 



a 



0, otherwise, [0, otherwise, 

are Lebesgue-almost everywhere equal. As L is continuous by assumption, we obtain ( |A.1[ ). 
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